Search CORE

96 research outputs found

Cancer Molecular Analysis Project: Weaving a rich cancer research tapestry

Author: Buetow Kenneth H
Fine Howard
Kaplan Richard
Klausner Richard D
Singer Dinah S
Strausberg Robert L
Publication venue: Cell Press.
Publication date: 31/05/2002
Field of study

AbstractThe Cancer Molecular Analysis Project (CMAP) of the NCI is integrating diverse cancer research data to elucidate fundamental etiologic processes, enable development of novel therapeutic approaches, and facilitate the bridging of basic and clinical science

Elsevier - Publisher Connector

Genetic Variation in an Individual Human Exome

Author: Axelrod Nelson
Busam Dana A.
Huang Jiaqi
Levy Samuel
Li Kelvin
Ng Pauline C.
Stockwell Timothy B.
Strausberg Robert L.
Venter J. Craig
Walenz Brian P.
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

There is much interest in characterizing the variation in a human individual, because this may elucidate what contributes significantly to a person's phenotype, thereby enabling personalized genomics. We focus here on the variants in a person's ‘exome,’ which is the set of exons in a genome, because the exome is believed to harbor much of the functional variation. We provide an analysis of the ∼12,500 variants that affect the protein coding portion of an individual's genome. We identified ∼10,400 nonsynonymous single nucleotide polymorphisms (nsSNPs) in this individual, of which ∼15–20% are rare in the human population. We predict ∼1,500 nsSNPs affect protein function and these tend be heterozygous, rare, or novel. Of the ∼700 coding indels, approximately half tend to have lengths that are a multiple of three, which causes insertions/deletions of amino acids in the corresponding protein, rather than introducing frameshifts. Coding indels also occur frequently at the termini of genes, so even if an indel causes a frameshift, an alternative start or stop site in the gene can still be used to make a functional protein. In summary, we reduced the set of ∼12,500 nonsilent coding variants by ∼8-fold to a set of variants that are most likely to have major effects on their proteins' functions. This is our first glimpse of an individual's exome and a snapshot of the current state of personalized genomics. The majority of coding variants in this individual are common and appear to be functionally neutral. Our results also indicate that some variants can be used to improve the current NCBI human reference genome. As more genomes are sequenced, many rare variants and non-SNP variants will be discovered. We present an approach to analyze the coding variation in humans by proposing multiple bioinformatic methods to hone in on possible functional variation

CiteSeerX

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Evaluation of next generation sequencing platforms for population targeted sequencing studies

Author: Beeson Karen Y
Frazer Kelly A
Harismendy Olivier
Levy Samuel
Murray Sarah S
Ng Pauline C
Schork Nicholas J
Stockwell Timothy B
Strausberg Robert L
Topol Eric J
Wang Xiaoyun
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Human sequence generated from three next-generation sequencing platforms reveals systematic variability in sequence coverage due to local sequence characteristics

Crossref

Springer - Publisher Connector

PubMed Central

Systematic detection of putative tumor suppressor genes through the combined use of exome and transcriptome sequencing

Author: Caballero Otavia L
Camargo Anamaria A
de Souza Sandro J
Edsall Lee
Galante Pedro A
Kirkness Ewen F
Kuan Samantha
Levy Samuel
Parmigiani Raphael B
Ren Bing
Simpson Andrew JG
Strausberg Robert L
Vasconcelos Ana Tereza R
Ye Zhen
Zhao Qi
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background To identify potential tumor suppressor genes, genome-wide data from exome and transcriptome sequencing were combined to search for genes with loss of heterozygosity and allele-specific expression. The analysis was conducted on the breast cancer cell line HCC1954, and a lymphoblast cell line from the same individual, HCC1954BL. Results By comparing exome sequences from the two cell lines, we identified loss of heterozygosity events at 403 genes in HCC1954 and at one gene in HCC1954BL. The combination of exome and transcriptome sequence data also revealed 86 and 50 genes with allele specific expression events in HCC1954 and HCC1954BL, which comprise 5.4% and 2.6% of genes surveyed, respectively. Many of these genes identified by loss of heterozygosity and allele-specific expression are known or putative tumor suppressor genes, such as BRCA1, MSH3 and SETX, which participate in DNA repair pathways. Conclusions Our results demonstrate that the combined application of high throughput sequencing to exome and allele-specific transcriptome analysis can reveal genes with known tumor suppressor characteristics, and a shortlist of novel candidates for the study of tumor suppressor activities

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Somatic sequence alterations in twenty-one genes selected by expression profile analysis of breast carcinomas

Author: Burdett Laurie
Børresen-Dale Anne-Lise
Chanock Stephen J
Gerhard Daniela S
Kaaresen Rolf
Kristensen Vessela
Langerød Anita
Llaca Victor
Perou Charles
Presswalla Shafaq
Strausberg Robert L
Yeager Meredith
Publication venue: BioMed Central Ltd
Publication date: 01/01/2007
Field of study

Abstract Introduction Genomic alterations have been observed in breast carcinomas that affect the capacity of cells to regulate proliferation, signaling, and metastasis. Re-sequence studies have investigated candidate genes based on prior genetic observations (changes in copy number or regions of genetic instability) or other laboratory observations and have defined critical somatic mutations in genes such as TP53 and PIK3CA. Methods We have extended the paradigm and analyzed 21 genes primarily identified by expression profiling studies, which are useful for breast cancer subtyping and prognosis. This study conducted a bidirectional re-sequence analysis of all exons and 5', 3', and evolutionarily conserved regions (spanning more than 16 megabases) in 91 breast tumor samples. Results Eighty-seven unique somatic alterations were identified in 16 genes. Seventy-eight were single base pair alterations, of which 23 were missense mutations; 55 were distributed across conserved intronic regions or the 5' and 3' regions. There were nine insertion/deletions. Because there is no a priori way to predict whether any one of the identified synonymous and noncoding somatic alterations disrupt function, analysis unique to each gene will be required to establish whether it is a tumor suppressor gene or whether there is no effect. In five genes, no somatic alterations were observed. Conclusion The study confirms the value of re-sequence analysis in cancer gene discovery and underscores the importance of characterizing somatic alterations across genes that are related not only by function, or functional pathways, but also based upon expression patterns

Helsebibliotekets Research Archive

PubMed Central

Carolina Digital Repository

NORA - Norwegian Open Research Archives

The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families

Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

UNT Digital Library

The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific

The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed “fragment recruitment,” addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed “extreme assembly,” made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS

Public Library of Science (PLOS)

Crossref

Repositorio Institucional de la Universidad de Costa Rica

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Evolutionary and biomedical insights from the rhesus macaque genome

The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species

Louisiana State University

ORESTES are enriched in rare exon usage variants affecting the encoded proteins

Author: Andrew J.G. Simpson
André C. Zaiats
Elisson C. Osório
Fábio Passetti
Helena Brentani
Jorge E.S. de Souza
João Paulo Kitajima
Maarten R. Leerkes
Noboru Jo Sakabe
Paulo S.L. de Oliveira
Pedro A.F. Galante
Ricardo R. Brentani
Robert L. Strausberg
Sandro José de Souza
Publication venue
Publication date: 01/01/2003
Field of study

Comptes Rendus Biologies (CRBIOL)

CR Biologies